Skip to content

Conversation

@pajakd
Copy link
Collaborator

@pajakd pajakd commented Nov 5, 2025

Description

Issue

Testing

)

// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.
type Type string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type Type string
type AcceleratorType string

Maybe like this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the slice API, this is just copying the canonical version of this API (exists in some doc), and we need to match that.

metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this note related to Type. Can we add some space between to avoid confusion?

// Type specifies the type of accelerator used in this slice, e.g., "v6e", "tpu-v7x".
// +kubebuilder:validation:Immutable
Type string `json:"type"`
// +kubebuilder:validation:Enum=v6e;tpu-v7x
Copy link
Collaborator

@mbobrovskyi mbobrovskyi Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this validation to type?

)

// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.
type Type string
Copy link
Collaborator

@mbobrovskyi mbobrovskyi Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this

Suggested change
type Type string
// +kubebuilder:validation:Enum=v6e;tpu-v7x
type Type string

TPUSliceHealthNodeSelectorValue = "true"

AcceleratorTpu7x = "tpu-v7x"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
)
const(


package core

type MMIGHealthStatus string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also move it to core.go.

func Deformed(slice *v1alpha1.Slice) bool {
return meta.IsStatusConditionTrue(slice.Status.Conditions, string(v1alpha1.Deformed))
func IsStale(slice *v1alpha1.Slice) bool {
cond := meta.FindStatusCondition(slice.Status.Conditions, string(v1alpha1.SliceStateConditionType))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cond := meta.FindStatusCondition(slice.Status.Conditions, string(v1alpha1.SliceStateConditionType))
cond := meta.FindStatusCondition(slice.Status.Conditions, v1alpha1.SliceStateConditionType)

}

func Deactivating(slice *v1alpha1.Slice) bool {
cond := meta.FindStatusCondition(slice.Status.Conditions, string(v1alpha1.SliceStateConditionType))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cond := meta.FindStatusCondition(slice.Status.Conditions, string(v1alpha1.SliceStateConditionType))
cond := meta.FindStatusCondition(slice.Status.Conditions, v1alpha1.SliceStateConditionType)

return cond != nil && cond.Status == metav1.ConditionFalse && time.Since(cond.LastTransitionTime.Time) >= ActivationTimeout
}

func Deactivating(slice *v1alpha1.Slice) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unused.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, removed.

return s
}

func (s *SliceWrapper) Type(Type string) *SliceWrapper {
Copy link
Collaborator

@mbobrovskyi mbobrovskyi Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (s *SliceWrapper) Type(Type string) *SliceWrapper {
func (s *SliceWrapper) Type(acceleratorType string) *SliceWrapper {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or

Suggested change
func (s *SliceWrapper) Type(Type string) *SliceWrapper {
func (s *SliceWrapper) Type(t string) *SliceWrapper {

func (s *SliceWrapper) Active() *SliceWrapper {
cond := metav1.Condition{
Type: string(v1alpha1.Ready),
Type: string(v1alpha1.SliceStateConditionType),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Type: string(v1alpha1.SliceStateConditionType),
Type: v1alpha1.SliceStateConditionType,

cond := metav1.Condition{
Type: string(v1alpha1.Forming),
Status: metav1.ConditionTrue,
Type: string(v1alpha1.SliceStateConditionType),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Type: string(v1alpha1.SliceStateConditionType),
Type: v1alpha1.SliceStateConditionType,

func (s *SliceWrapper) Degraded() *SliceWrapper {
cond := metav1.Condition{
Type: string(v1alpha1.Deformed),
Type: string(v1alpha1.SliceStateConditionType),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Type: string(v1alpha1.SliceStateConditionType),
Type: v1alpha1.SliceStateConditionType,

cond := metav1.Condition{
Type: string(v1alpha1.Degraded),
Status: metav1.ConditionTrue,
Type: string(v1alpha1.SliceStateConditionType),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Type: string(v1alpha1.SliceStateConditionType),
Type: v1alpha1.SliceStateConditionType,

Status: metav1.ConditionTrue,
LastTransitionTime: metav1.Now(),
Reason: "ByTest",
Type: string(v1alpha1.SliceStateConditionType),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Type: string(v1alpha1.SliceStateConditionType),
Type: v1alpha1.SliceStateConditionType,

TPUSliceHealthNodeSelectorKey = "cloud.google.com/gke-tpu-slice-4x4x4-health"
TPUSliceHealthNodeSelectorValue = "true"

AcceleratorTpu7x = "tpu-v7x"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have it in API, right? Can we remove it?

type Type string

const (
TypeV6e Type = "v6e"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TypeV6e Type = "v6e"
AcceleratorTypeV6e Type = "v6e"


const (
TypeV6e Type = "v6e"
TypeTpu7x Type = "tpu-v7x"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TypeTpu7x Type = "tpu-v7x"
AcceleratorTypeTpu7x Type = "tpu-v7x"

type Type string

const (
TypeV6e Type = "v6e"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we relax validation for it in IsValidTPUAccelerator()?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to implement logic supporting v6 before relaxing this.

// SliceSpec defines the desired state of Slice.
type SliceSpec struct {
// AcceleratorType specifies the type of accelerator used in this slice.
// Type specifies the type of accelerator used in this slice, e.g., "v6e", "tpu-v7x".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Type specifies the type of accelerator used in this slice, e.g., "v6e", "tpu-v7x".
// type specifies the type of accelerator used in this slice, e.g., "v6e", "tpu-v7x".

// +kubebuilder:validation:Enum=v6e;tpu-v7x
Type Type `json:"type"`

// Topology represents the network topology of the slice.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Topology represents the network topology of the slice.
// topology represents the network topology of the slice.

// +kubebuilder:validation:Pattern=^\d+x\d+(x\d+)?$
Topology string `json:"topology"`

// Partition Ids denotes the set of partitions to use to form a slice
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Partition Ids denotes the set of partitions to use to form a slice
// partitionIds denotes the set of partitions to use to form a slice

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the same for status.

@PBundyra
Copy link
Collaborator

PBundyra commented Nov 7, 2025

Can we separate API changes and timeout logic into two PRs?

Copy link
Collaborator

@gabesaba gabesaba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Only one significant comment: I propose keeping the smaller internal notion of slice state separate from the MMIGHealthState.

I can try to give a LGTM from KubeCon, once you address comments. If not, @PBundyra or @mbobrovskyi can

type Type string

const (
TypeV6e Type = "v6e"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to implement logic supporting v6 before relaxing this.

)

// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.
type Type string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the slice API, this is just copying the canonical version of this API (exists in some doc), and we need to match that.

Comment on lines +25 to +28
var SliceStates = []MMIGHealthStatus{
MMIGHealthStatusActivating, MMIGHealthStatusActive, MMIGHealthStatusActiveDegraded,
MMIGHealthStatusDeactivating, MMIGHealthStatusFailed, MMIGHealthStatusIncomplete,
MMIGHealthStatusUnknown,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep separate the notion of slice state from MMIGHealthStatus. This fewer number of states will be more managable in the controller. WDYT?

We would have a function of Ready condtion's (Status, Reason) -> SliceState.

e.g.

(Unknown, "") -> ACCEPTED
(False, Incomplete) -> INITIALIZING
(False, Activating) -> INITIALIZING
...
(False, Failed) -> ERROR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants